Back

Journal of Chemical Information and Modeling

American Chemical Society (ACS)

Preprints posted in the last 90 days, ranked by how well they match Journal of Chemical Information and Modeling's content profile, based on 207 papers previously published here. The average preprint has a 0.22% match score for this journal, so anything above that is already an above-average fit.

1
Comparative Analysis of Relative Ligand Binding Free Energy Simulation Methods: Amber-TI, GROMACS-NETI, OpenMM-FEP, and BLaDE-MSLD

Lee, H.; Kim, I.; Kim, S.; Bae, M.; Jeong, B.; Kim, S.; Jo, S.; Lee, J.; Im, W.

2026-04-24 biophysics 10.64898/2026.04.22.720125 medRxiv
Top 0.1%
68.2%
Show abstract

Structure-based drug design has become increasingly important in the pharmaceutical industry for accelerating the discovery of effective drug candidates. In particular, ligand binding free energy serves as a critical metric for predicting drug efficacy during the key stages of hit discovery and lead optimization. Continuous progresses have been made in the prediction of ligand binding free energies, but direct comparisons of different methods using the same force field remain challenging due to their unique implementations into different simulation engines. In this study, we present a direct comparison of four popular methodologies (Amber-TI, GROMACS-NETI, OpenMM-FEP, and BLaDE-MSLD) for calculating relative binding free energies ({Delta}{Delta}Gbind) with the same Amber protein and ligand force fields using MolCube Alchemical Free Energy Simulator (MolCube-AFES), which provides an input generation workflow to support {Delta}{Delta}Gbind calculations of all four methods. We used 80 alchemical transformations (among the JACS benchmark set by Wang et al.) and two additional applications to compare the predicted {Delta}{Delta}Gbind from the four methods against experimental measurements. All four methods reproduced experimentally observed trends with most transformations within {+/-}2 kcal/mol from experiments and show broadly comparable accuracy with no statistically significant performance differences across the benchmark dataset. These results demonstrate that MolCube-AFES enables controlled, cross platform benchmarking and show that all four different alchemical free energy methods deliver statistically equivalent accuracy, with method selection guided by workflow requirements such as throughput, portability, and perturbation network design rather than expected differences in performances.

2
Evolutionary exploration of drug-like chemical space utilizing generative AI and virtual screening

Secker, C.; Secker, P.; Yergoez, F.; Celik, M. O.; Chewle, S.; Phuong Nga Le, M.; Masoud, M.; Christgau, S.; Weber, M.; Gorgulla, C.; Nigam, A.; Pollice, R.; Schuette, C.; Fackeldey, K.

2026-03-30 bioinformatics 10.64898/2026.03.26.714527 medRxiv
Top 0.1%
62.8%
Show abstract

The identification of suitable lead molecules in the vast chemical space is a critical and challenging task in drug discovery campaigns. Recently, it has been demonstrated that large-scale virtual screening provides a powerful approach to accelerate the identification of novel drug candidates by screening ever increasing virtual ligand libraries, which have reached magnitudes of > 1020 compounds. However, this desirable increase in potentially bioactive molecules poses a new challenge as enumerating and virtually screening such huge compound libraries is computationally prohibitive. Consequently, advanced approaches to navigate ultra-large chemical spaces and to identify suitable candidate molecules therein are urgently needed. Here, we present an evolutionary algorithm framework using molecular generative AI, reaction-based substructure searching, and iterative model fine-tuning for a targeted and efficient exploration of chemical fragment spaces. Combining this approach with large-scale virtual screening we are able to identify target-specific candidate molecules within the commercially available Enamine REAL Space ([~]1015). We demonstrate the applicability of the approach by successfully identifying and biochemically validating pH-specific ligands of the {micro}-opioid receptor. Our results demonstrate that integrating generative AI with evolutionary algorithms provides a promising route to explore ultra-large chemical spaces for the discovery of novel, synthetically accessible lead molecules.

3
G-screen: Scalable Receptor-Aware Virtual Screening through Flexible Ligand Alignment

Jung, N.; Park, H.; Yang, J.; Seok, C.

2026-03-05 biophysics 10.64898/2026.03.03.707320 medRxiv
Top 0.1%
62.7%
Show abstract

Virtual screening has long been a central computational tool for rational ligand discovery, enabling the systematic prioritization of candidate molecules from large chemical libraries. Although docking and related approaches that explicitly account for receptor-ligand interactions have been developed and refined over several decades, achieving both reliable receptor-aware interaction modeling and computational scalability remains an open challenge, particularly for ultra-large chemical spaces. Ligand-based methods are fast and robust but do not explicitly incorporate receptor structure, whereas docking-based approaches model receptor-ligand interactions more directly at substantially higher computational cost. Here, we present G-screen, a freely available and scalable receptor-aware virtual screening framework designed for cases in which a reference protein-ligand complex structure is available. Instead of performing full docking, G-screen rapidly aligns candidate ligands to the reference ligand using a flexible global alignment algorithm (G-align) and evaluates receptor-aware pharmacophore interactions derived from the reference complex, thereby combining the efficiency of ligand-based alignment with explicit atomic-level interaction analysis. Benchmarking on DUD-E, LIT-PCBA, and MUV datasets demonstrates that G-screen achieves competitive discrimination and early enrichment relative to representative ligand-based and docking-based methods, while maintaining millisecond-scale per-molecule runtimes under multi-threaded execution. These results position G-screen as a practical and scalable receptor-aware screening strategy for efficiently filtering large chemical libraries when a reference complex structure is available. Scientific ContributionWe have developed a scalable virtual screening framework for efficiently filtering ultra-large chemical libraries using a flexible global alignment algorithm combined with receptor-aware pharmacophore evaluations. Despite explicitly capturing atomic-level interactions, the screening process using this method is highly efficient, maintaining millisecond-scale per-molecule runtimes under parallel execution. It achieves competitive discrimination and early enrichment, successfully bridging the speed of ligand-based approaches with the structural context of traditional docking.

4
Integrating the MARTINI2 Coarse-Grained Force Field into HADDOCK3 for Faster Modelling of Large Biomolecular Complexes

Versini, R.; Reys, V. G. P.; Kravchenko, A.; Honorato, R. V.; Bonvin, A. M. J. J.

2026-04-27 bioinformatics 10.64898/2026.04.25.720800 medRxiv
Top 0.1%
58.6%
Show abstract

The integration of coarse-grained (CG) approaches into docking workflows offers a powerful strategy for modelling large biomolecular assemblies with reduced computational costs. We present here the implementation of the MARTINI2 coarse-grained force field into the HADDOCK3 integrative modelling platform. This development enables the use of the CG representations and parameters within HADDOCK3 for efficient sampling and scoring of large protein-protein complexes. The implementation takes advantage of the modular and flexible architecture of HADDOCK3, allowing a seamless combination of MARTINI2 representation with the various modules. Conversion from and to all-atom models is integrated into the coarse-grained modelling workflow. The performance of the protocol is first assessed on protein-protein and protein-DNA benchmarks and then illustrated on a few representative large-scale systems, demonstrating a significant reduction in computational costs while maintaining biologically relevant accuracy.

5
Conformational Preference Classification of Integrin-Binding Ligands Using Free Energy Perturbation

Vögele, M.; Shahoei, R.; Petridis, L.; Li, J.; Lin, F.-Y.; Wang, L.; Springer, T. A.; Vendome, J.

2026-04-30 biophysics 10.64898/2026.04.27.721214 medRxiv
Top 0.1%
58.6%
Show abstract

Integrins are crucial cell adhesion receptors and attractive therapeutic targets, but developing oral small-molecule inhibitors has been challenging, at least in part due to inadvertent partial agonism caused by stabilization of the integrins open, high-affinity state. To address this challenge, we present a computational approach using Absolute Binding Free Energy Perturbation (AB-FEP) calculations to predict whether a ligand will stabilize the open or closed integrin states, leveraging the difference between the ligands binding free energy to the respective end states. Despite challenges posed by Ca and Mg ions, metal-coordinating residues in the binding pocket, and the subtlety of structural differences between states, AB-FEP achieves excellent classification performance on a set of known opening and closing ligands, significantly outperforming docking scores and MM-GBSA results. We also show a good correlation between AB-FEP binding free energy differences and experimental values. Furthermore, AB-FEP provides insights into intermediate integrin states and analysis of simulation trajectories confirmed the formation of a water-mediated hydrogen bond network with an ion in the binding pocket to be characteristic for closing ligands. This work demonstrates AB-FEP as a robust method for classifying integrin ligands and understanding their functional mechanisms, offering valuable guidance for designing safe and conformationally selective integrin therapeutics. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=109 SRC="FIGDIR/small/721214v1_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@7452c2org.highwire.dtl.DTLVardef@e57d5corg.highwire.dtl.DTLVardef@8959d8org.highwire.dtl.DTLVardef@169742e_HPS_FORMAT_FIGEXP M_FIG C_FIG

6
Structure-Based and Stability-Validated Prioritization of BACE1 Inhibitors Integrating Meta-Ensemble QSAR and Molecular Dynamics

Chowdhury, T. D.; Shafoyat, M. U.; Hemel, N. H.; Nizam, D.; Sajib, J. H.; Toha, T. I.; Nyeem, T. A.; Farzana, M.; Haque, S. R.; Hasan, M.; Siddiquee, K. N. e. A.; Mannoor, K.

2026-04-10 bioinformatics 10.64898/2026.04.07.716920 medRxiv
Top 0.1%
55.6%
Show abstract

Alzheimers disease remains a major therapeutic challenge, and no {beta}-secretase (BACE1) inhibitor has achieved clinical approval. A key limitation of prior discovery efforts is reliance on single-parameter optimization, often resulting in candidates with limited translational potential. In this study, we developed a biology-informed computational framework integrating meta-ensemble QSAR modeling, molecular docking, Protein Language Model (ESM-1b)-guided residue interaction weighting, and ADMET profiling within a normalized multi-parameter ranking scheme. Model performance was validated using cross-validation, external validation, and Y-randomization (n = 100; p = 0.009), while applicability domain analysis based on Tanimoto similarity highlighted reduced reliability for extrapolative predictions. Sensitivity analysis showed high ranking stability under moderate perturbations (Spearman {rho} = 0.998 for {+/-}10%; 0.963 for {+/-}25%), with reduced agreement under randomized weighting ({rho} = 0.821), indicating that prioritization is robust but influenced by weight selection. Screening of 16,196 compounds identified 153 predicted actives (accuracy = 0.852; ROC-AUC = 0.920), which were refined to 111 candidates and seven prioritized leads. Molecular dynamics simulations (200 ns) indicated stable binding and persistent catalytic interactions, with Mol-2 showing favorable dynamic stability and ADMET characteristics. Overall, this study presents an interpretable and quantitatively evaluated framework for multi-parameter compound prioritization, supporting more reliable virtual screening in early-stage CNS drug discovery.

7
Deciphering the Molecular Structure of the Type III Secretion System in Chlamydia trachomatis for Structure-Based Therapeutic Targeting

Panda, A.; Kapoor, J.; Rajagopal, R.; Kumar, S.; Bandyopadhyay, A.

2026-05-09 bioinformatics 10.64898/2026.05.06.723290 medRxiv
Top 0.1%
55.2%
Show abstract

Chlamydia trachomatis is an obligate intracellular Gram-negative pathogen responsible for sexually transmitted infections and trachoma in humans. Although antibiotics are generally effective against acute infections, persistent chlamydial forms often exhibit reduced susceptibility during chronic infection. Chlamydia relies on its type III secretion system (T3SS) to inject effector proteins into host cells, making T3SS proteins attractive targets for antivirulence therapeutics. In this study, we employed an integrated computational pipeline to model and assemble the C. trachomatis T3SS constituent proteins. Template-based modeling using crystallographic structures of homologs from other Gram-negative bacteria revealed a highly conserved structural architecture despite low sequence identity (18-46%). Stereochemical validation confirmed high model quality, with most T3SS proteins exhibiting favorable protein-protein interactions (PPIs). Since the activity of the T3SS complex relies on extensive PPIs, we targeted these PPIs as a promising approach to attenuate bacterial virulence. CdsN, which functions as an ATPase of the T3SS, is a hexamer of which we targeted the dimerization interface. Structure-based virtual screening of compounds from the e-Drug3D and IMPPAT libraries against predicted hotspot residues and the identified druggable pocket at the CdsN dimeric interface, followed by ADMET screening, yielded three promising candidates: M Roflumilast (Drug ID: 1537), Elacestrant (Drug ID: 2081), and Tecovirimat (Drug ID: 1889). All three ligands formed thermodynamically stable complexes with the CdsN dimer, with Elacestrant demonstrating the most favourable binding free energy. This was also confirmed by 100 ns molecular dynamics simulation. This study provides new insights into the molecular architecture of C. trachomatis T3SS and identifies M Roflumilast, Elacestrant, and Tecovirimat as potential drug candidates against chlamydial infection. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=129 SRC="FIGDIR/small/723290v1_ufig1.gif" ALT="Figure 1"> View larger version (58K): org.highwire.dtl.DTLVardef@1821599org.highwire.dtl.DTLVardef@1581baaorg.highwire.dtl.DTLVardef@1805e98org.highwire.dtl.DTLVardef@c25e56_HPS_FORMAT_FIGEXP M_FIG C_FIG

8
An Open-Source Reproducible Workflow for Pocket-Oriented Virtual Screening and ADME-Integrated Chemoinformatics: A Multi-Target Flavivirus Case Study

Teixeira, J. P.; Bajay, M. M.; Freire, C. C. d. M.; Bettin, L. B. F.; Soares, A. P.; de Lima Neto, D. F.

2026-04-29 bioinformatics 10.64898/2026.04.28.721199 medRxiv
Top 0.1%
54.2%
Show abstract

Zika virus (ZIKV), yellow fever virus (YFV), West Nile virus (WNV), Usutu virus (USUV), and Saint Louis encephalitis virus (SLEV) remain major public health concerns, yet broad-spectrum antiviral options are limited. Here, we present an open-source, reproducible software workflow for pocket-oriented virtual screening and ADME-integrated chemoinformatics, designed to support standardized multi-target compound prioritization. As a case study, the workflow was applied to structural and nonstructural proteins from clinically relevant flaviviruses. Automated pocket detection using Concavity reduces site-selection bias by generating docking boxes from surface concavity clusters, while standardized downstream scripts parse docking logs, convert docking-derived binding energies into Kd-related metrics, integrate SwissADME descriptors, and compute LE, LLE, FQ, and drug-likeness rules. The framework also supports retrospective validation and comparative benchmarking using literature-supported reference compounds and target-specific plausibility checks. Rather than proposing experimentally validated antiviral candidates, this study provides a reusable computational framework for hypothesis generation, benchmarking, and downstream experimental prioritization in structure-based drug discovery. The workflow is modular and adaptable to other multi-target screening campaigns where integrated ranking across binding, physicochemical, and ADME dimensions is required. SUMMARYWe describe an open-source, reproducible software workflow that integrates pocket-oriented docking, ligand efficiency scoring, ADME descriptor integration, and multivariate chemoinformatics to standardize compound prioritization across multiple protein targets. The workflow combines open-source tools with auditable Bash, R, and Python scripts and is demonstrated through a multi-target flavivirus case study. Rather than claiming experimentally validated antiviral activity, the framework is intended to support hypothesis generation, retrospective benchmarking, transparent reporting, and downstream experimental prioritization.

9
DyME: An MD-based engine exploiting HTP mutagenesis for protein engineering and recognition mimicry

Guillem-Gloria, P. M.; Ruiz-Gomez, G.; Pisabarro, M. T.

2026-04-13 bioinformatics 10.64898/2026.04.10.717642 medRxiv
Top 0.1%
53.2%
Show abstract

Protein recognition mimicry is of great interest in the field of molecular bioengineering and rational design, with mutagenesis frequently employed to analyze the effects of altering amino acids involved in molecular recognition. The conformational and energetic effects of such alterations can be investigated in detail with the help of molecular dynamics (MD) methodologies. While existing MD-based computational tools can be used to explore a particular set of mutations at a time, suitable for small-scale studies, high-throughput (HTP) exploration of protein recognition for engineering purposes would greatly benefit from an integrative platform that streamlines preparation, mutagenesis, simulation and post-processing of up to several thousand molecular systems, along with robust tools for comprehensive and straightforward comparative analysis. DyME (Dynamic Mutagenesis Engine) is a distributed platform that enables systematic investigations of protein recognition mimicry by combining HTP mutagenesis, solvated MD simulations and a Toolbox for comparative analysis (TCA), including interfacial water-site mapping. DyME uses 3D structural information of any protein-protein or protein-DNA complex as input. Its automated MD-based mutagenesis engine facilitates systematic investigation of how site-specific alterations affect recognition, enabling the organization of single, double and triple modifications into combinatorial libraries for comprehensive comparative analysis. In DyME, relevant MD trajectory-derived data is scavenged and stored into a central database, providing aggregation capabilities that ease multi-feature analysis across an extensive collection of simulations. An interactive web-GUI and specialized widgets simplify preparation and efficient molecular and numerical comparative exploration. DyMEs capabilities are evaluated using available experimental data. Its source code is available at https://github.com/pisabarro-group/DYME

10
Integrating computational chemistry and machine learning to predict KRAS mutation-induced resistance

Mizgalska, K.; Urbaniak, K.; Imbody, D. J.; Haura, E. B.; Guida, W. C.; Branciamore, S.; Karolak, A.

2026-04-11 biophysics 10.64898/2026.04.10.717640 medRxiv
Top 0.1%
52.3%
Show abstract

Mutation-induced drug resistance is a major contributor to the failure of targeted cancer therapies, particularly in tumors driven by mutations in the KRAS oncogene. Although covalent inhibitors effectively target KRAS G12C, secondary mutations such as G12C/Y96C, G12C/Y96S, and G12C/Y96D lead to resistance despite leaving the covalent attachment site intact. To predict these resistance outcomes, we developed a computational framework that integrates molecular dynamics-derived structural, energetic, thermodynamic, and contact-based descriptors with machine learning. Features extracted from simulations of treatment-sensitive and treatment-resistant KRAS mutants were used to train logistic regression, random forest, support vector machine, and Bayesian Network classifiers, achieving average accuracies above 90%. Solvent-accessible surface area variability, Lennard-Jones 1,4 energy, mean square displacement, and root mean square fluctuation emerged as the most discriminatory features. Residues G10, E62, and H95 showed the highest predictive value. This approach highlights conformational and solvent-exposure changes as central drivers of KRAS drug resistance and provides a generalizable workflow for other clinically relevant mutant targets. Author SummaryMutation-induced resistance is a common challenge across many cancer types and is often associated with aggressive tumor progression and poor therapeutic response. Investigating the dynamic properties of proteins harboring such mutations provides valuable insights into the structural and functional consequences of these alterations, thereby helping to elucidate the mechanisms of drug resistance. Machine learning algorithms are particularly effective at uncovering complex patterns within high-dimensional data, such as molecular dynamics simulation trajectories. Integrating these algorithms with analysis of protein dynamics holds significant potential to aid in drug discovery challenges by reducing both time and resource demands while increasing the likelihood of identifying effective therapeutic candidates. As a proof of concept, we developed a computational framework that integrates molecular dynamics-derived molecular features with machine learning to distinguish treatment-sensitive from treatment-resistant KRAS mutants. KRAS is known for drug resistance arising from secondary mutations that disrupt inhibitor binding despite intact covalent attachment sites. The models achieved over 90% accuracy and identified solvent-exposure and conformational changes at residues G10, E62, and H95 as key predictors of treatment resistance. This workflow offers a generalizable strategy for understanding and forecasting mutation-induced resistance.

11
MOZAIC: Compound Growth via In Silico Reactions and Global Optimization using Conformational Space Annealing

Yoo, J.; Shin, W.-H.

2026-03-10 bioinformatics 10.64898/2026.03.07.710272 medRxiv
Top 0.1%
52.2%
Show abstract

MotivationFragment-based drug discovery (FBDD) is an efficient strategy that leverages small molecular fragments to explore broader chemical space by combining them. Advances in computational methods have enabled the calculation of molecular properties and docking scores, thereby accelerating the development of algorithm- and AI-based approaches in FBDD. However, it should be noted that certain methods do not provide synthetic pathways to obtain the proposed compounds. Consequently, these molecules might not be synthesized easily. ResultsIn light of these developments, we propose MOZAIC, a novel framework that explores chemical space using a reaction-based fragment growing and Conformational Space Annealing, a powerful global optimization algorithm. Our results show that MOZAIC effectively produces chemically diverse molecules with balanced improvements in lead-like properties, including QED, synthetic accessibility, and binding affinity. Furthermore, its flexible objective function allows fine-tuning for specific design goals, such as enhancing solubility with binding affinity. These capabilities position MOZAIC as a valuable platform for advancing fragment-to-lead and lead optimization efforts in drug discovery. Availability and implementationMOZAIC is available at https://github.com/kucm-lsbi/MOZAIC/. Supplementary InformationSupplementary data are available at Bioinformatics online.

12
CGAgentX: Agentic AI Framework to Develop Transferable Coarse-Grained Models

Deshmukh, S. A.; Seth, S.

2026-04-18 biophysics 10.64898/2026.04.17.719081 medRxiv
Top 0.1%
51.2%
Show abstract

We present CGAgentX, a general autonomous multi-agent framework in which specialized LLM-based agents coordinate the optimization of coarse-grained (CG) model parameters to reproduce target properties. Using polar solvents -- dimethyl sulfoxide (DMSO) and N,N-dimethylacetamide (DMA) -- as representative case studies, we demonstrate the frameworks capability to develop CG models that accurately reproduce key properties from atomistic simulations and experimental literature. Six specialized agents -- Mapping, Topology, Boundary, Hypothesis, Diagnostic, and Optimization -- operate under a Master Agent that orchestrates closed-loop, iterative parameter refinement by autonomously invoking external tools, including molecular dynamics (MD) simulations and analysis workflows, and evaluating outputs through a fitness function. Central to the framework is a Hypothesis Agent that generates and verifies physically motivated parameter hypotheses by coordinating parallel multi-fork simulations, wherein multiple candidate parameter sets are evaluated simultaneously. This multi-fork strategy expands parameter space exploration, yielding richer datasets that enable more accurate hypothesis refinement across iterations. Agents adaptively propose parameter updates based on intermediate simulation outcomes, enabling efficient navigation of complex trade-offs among structural, thermodynamic, and transport properties. The framework reproduces key experimental properties within 5% accuracy while maintaining consistency with atomistic reference behavior, achieving convergence without manual intervention. The modular architecture is readily extensible to other molecular systems and can accommodate additional targets, constraints, or simulation engines, providing a general agentic-AI platform for transferable CG model development. TOC GRAPHICS O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=110 SRC="FIGDIR/small/719081v1_ufig1.gif" ALT="Figure 1"> View larger version (48K): org.highwire.dtl.DTLVardef@1caa787org.highwire.dtl.DTLVardef@1bcb5eaorg.highwire.dtl.DTLVardef@4b21d2org.highwire.dtl.DTLVardef@999675_HPS_FORMAT_FIGEXP M_FIG C_FIG

13
TwinSAR: An Adaptive Kernel-based Algorithm with logit-transformed Z-score Filtering for Chemical Twin Detection in Large-scale Virtual Screening

Haris Kulosmanovic, H.; Uguz, C.; DURDAGI, S.

2026-05-15 bioinformatics 10.64898/2026.05.12.724687 medRxiv
Top 0.1%
46.4%
Show abstract

Molecular similarity searching is a workhorse of cheminformatics, but the dominant Tanimoto/topological-fingerprint paradigm has well-known blind spots. It is highly sensitive to molecular size, suffers from steep activity cliffs, and frequently fails to retrieve scaffold-hopping bioisosteres. A complementary descriptor that has received comparatively little attention is global elemental composition. Despite the conceptual simplicity of comparing molecules by their elemental ratios, no widely deployed method exists for the statistically rigorous identification of "chemical twins" defined by stoichiometric proximity. We address this gap with TwinSAR (Stoichiometric Analysis and Retrieval), an adaptive kernel-based algorithm that combines three methodological innovations: (i) binary fingerprint blocking that partitions molecule by element-presence patterns and bounds the cost of all-pairs comparison from O(NM) to O({sum}nimi) enabling million/billion-scale searches; (ii) a per-block adaptive radial basis function (RBF) kernel whose precision parameter is calibrated independently for each fingerprint block via the median heuristic, providing fair similarity comparison across chemical sub-spaces of vastly different density; and (iii) a logit-transformed Z-score filter that maps bounded RBF scores onto an unbounded scale, allowing high-similarity pairs to be prioritized relative to the empirical score distribution of their own fingerprint block. TwinSAR is offered in two operating modes: (i) a deterministic BULK mode for exact reproducibility; and (ii) a stochastic FAST mode that achieved a 3.29x wall-clock speed-up in the present benchmark while preserving the similar unique-query and unique-target coverage. Statistical validation showed that detected twin pairs are 12.7x more similar in absolute ratio space than block-matched random pairs (p < 0.001), while a column-permutation negative control returned a median of zero spurious twins across three independent permutations. A controlled benchmark further established that an 8-element representation (single-element heavy-atom ratios) is sensitivity-equivalent to a comprehensive 254-element representation while running 3.55x faster. As a case study, TwinSAR was deployed in an end-to-end virtual screening pipeline against the BCL-2 target protein, where it reduced a 327,071-compound commercial library to a 390 focused candidate panel. The chemical interpretability of the retrieved twins is illustrated by their structural diversity around conserved heavy-atom skeletons. TwinSAR therefore provides a fast, conformation-free, and statistically principled prefilter that is fully orthogonal to topological fingerprints.

14
LigandForge: A Web Server for Structure-Guided De Novo Drug Design

Nada, H.; Sipos-Szabo, L.; Bajusz, D.; Keseru, G.; Gabr, M.

2026-04-03 bioinformatics 10.64898/2026.03.31.715741 medRxiv
Top 0.1%
43.3%
Show abstract

Despite advances in computational drug discovery, de novo drug design remains hindered by high licensing costs and the need for specialized programming expertise. We present LigandForge, a webserver for structure-guided de novo ligand generation. LigandForge integrates structural validation and binding-site characterization; voxel-based property grid construction for spatial mapping of electrostatics and hydrophobicity; chemistry-aware fragment assembly; multi-objective lead optimization; and retrosynthetic feasibility analysis. The platform utilizes a structure-guided framework to assemble molecules from curated fragment libraries while enforcing physicochemical constraints, including molecular weight, LogP, and hybridization states. Generated molecules are refined via reinforcement learning and genetic algorithms which are subsequently evaluated using composite metrics such as the quantitative estimate of drug-likeness. By leveraging RDKit for cheminformatics and NGL viewer for real-time 3D visualization, LigandForge provides a synthesis-aware environment that bridges the gap between macromolecular structural data and experimentally feasible lead compounds without requiring local software installation.

15
Cyclic peptides space: The methodology of sequence selection to cover the comprehensive physical properties

Tsuchihashi, R.; Kinoshita, M.

2026-03-12 bioinformatics 10.64898/2026.03.10.710724 medRxiv
Top 0.1%
42.1%
Show abstract

Cyclic peptides have emerged as a pivotal modality for next-generation therapeutics, due to their superior biocompatibility, high selectivity, and structural stability. While AI-driven peptide design has advanced rapidly, conventional optimization algorithms are often constrained by initialization biases, which impede the efficient exploration of the vast chemical space. Here, we propose a novel methodology that integrates the protein language model ESM-2 with cyclic permutation averaging of embeddings to resolve this bottleneck. This approach establishes a comprehensive "peptide space", a high-dimensional vector representation that encapsulates the physicochemical and structural attributes of cyclic peptides. Our analysis reveals that random sequence selection results in a heterogeneous distribution within this space, potentially underrepresenting specific functional regions. Conversely, navigating this defined peptide space enables the selection of libraries that uniformly span diverse molecular properties. In a proof-of-concept study designing binders for {beta}2-microglobulin ({beta}2m), we demonstrate that initial sequences uniformly sampled from our peptide space yield superior candidates more efficiently than those derived from random selection. Furthermore, this framework facilitates the quantitative assessment of mutational perturbations on global peptide properties, supporting rational decision-making for both broad exploration and local optimization. This "peptide space" concept provides a foundational framework for defining appropriate search boundaries and enhancing computational efficiency in AI-mediated drug discovery. Graphic Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=172 SRC="FIGDIR/small/710724v1_ufig1.gif" ALT="Figure 1"> View larger version (48K): org.highwire.dtl.DTLVardef@1dd903eorg.highwire.dtl.DTLVardef@128f941org.highwire.dtl.DTLVardef@1041e13org.highwire.dtl.DTLVardef@1527b25_HPS_FORMAT_FIGEXP M_FIG C_FIG

16
The genetically-encoded amino acids distribute non-randomly within a functionally-relevant chemical space

Brown, S. M.; Hervey, J.; Dean, S. N.; Vora, G. J.

2026-05-07 synthetic biology 10.64898/2026.05.06.723277 medRxiv
Top 0.1%
41.0%
Show abstract

The standard set of 20 genetically-encoded amino acids (C20) exhibits a statistically non-random distribution in primarily two structurally-relevant physicochemical properties: hydrophobicity and molecular volume, and to a lesser extent charge. It remains an open question, however, whether evolutionary pressures similarly optimized the same alphabet for the distribution of functionally-relevant properties, such as reactivity. In this study, we used semi-empirical quantum chemistry simulations to calculate the highest occupied molecular orbital and lowest unoccupied molecular orbital (HOMO-LUMO) gaps for 84 xeno amino acids and constructed 10 million random 20-mer amino acid alphabets to determine where C20 fit amongst this background. The HOMO-LUMO gap measurements demonstrated that C20, similar to hydrophobicity and volume, also exhibits a non-random distribution. However, unlike hydrophobicity and volume, this distribution is non-random across an unevenly broad range. The results expand upon previous theory and suggest HOMO-LUMO gap energies as one synthetic biologists may consider when developing novel protein design tools or designing functional xeno amino acid alphabets. HighlightsO_LILifes amino acid alphabet is non-randomly distributed within an expanded computationally-generated chemistry space generated from large-scale quantum chemistry simulations. C_LIO_LIAmino acid alphabet coverage theory applies beyond structurally-relevant physicochemical descriptors to include functionally-relevant properties like reactivity as measured by frontier molecular orbitals C_LIO_LIFindings here provide a theoretical framework to guide the design of novel proteins and development of synthetic amino acid alphabets. C_LI

17
DESPOT: Direction-Enhanced Scoring POTentials

Poelmans, R.; Bruncsics, B.; Arany, A.; Van Eynde, W.; Shemy, A.; Moreau, Y.; Voet, A. R.

2026-04-02 bioinformatics 10.64898/2026.03.31.714140 medRxiv
Top 0.1%
38.8%
Show abstract

Knowledge-based potentials (KBPs) have long been used to score protein-ligand interactions, yet existing formulations remain isotropic, capturing only distance dependencies and neglecting the directional preferences that govern molecular recognition. Here, we introduce Direction-Enhanced Scoring POTentials (DESPOT), an anisotropic knowledge-based framework that unifies pose scoring and binding-site characterisation within a single probabilistic model. The new probabilistic formulation used in DESPOT naturally supports directional modelling through atom type-specific local reference frames and symmetry-aware geometric discretisation. It also supports steric exclusion, encoded as a dedicated void state that explicitly captures the probability that a spatial bin remains unoccupied. The anisotropic interaction profiles learned by DESPOT reveal systematic directional preferences for interactions such as hydrogen bonds, aromatic interactions, and halogen bonds, that extend beyond idealised geometric models. Evaluation on the CASF-2016 benchmark shows that DESPOT sub-stantially outperforms isotropic KBPs in all pose-discrimination and virtual screening tasks (p << 0.0001 for all enrichment factors), with the largest gains arising from its ability to penalise geometrically implausible poses. Constrained energy minimisation of training structures proves strongly beneficial for the derivation of KBPs, while our train-test leakage analysis reveals that overfitting is an underestimated and understudied issue for KBPs. DESPOT provides a data-driven framework for direction-aware modelling of protein-ligand interactions, with applications in pose scoring, binding-site characterisation, and structure-based design.

18
A Quantum Lens on Molecular Design: A Machine-Learned Energy Function from Interacting Quantum Atoms.

Hoffmann, M.; Kazimir, A.; Oesterreich, T.; Kaermer, L.; Engelberger, F.; Meiler, J.; Lamers, C.

2026-03-05 bioinformatics 10.64898/2026.03.03.709242 medRxiv
Top 0.1%
38.3%
Show abstract

Accurate predictions of the interactions (covalent bonds and non-covalent contacts between atoms) in a molecular system require scalable, accurate, and interpretable energy functions. While classical force fields and knowledge-based energy functions struggle to capture key electronic effects, quantum chemistry approaches such as density functional theory (DFT) provide the necessary accuracy but remain computationally demanding. Furthermore, gaining insight into interactions requires energy decomposition schemes. The Interacting Quantum Atoms (IQA) scheme is exceptionally attractive, offering a chemically intuitive, electron density (ED) topologically based separation into intra- and interatomic contributions, however its high computational cost remains a significant barrier for application to larger systems or tasks like ligand screening in drug discovery. We address these limitations by introducing a novel machine learning (ML) framework to predict accurate energies derived from the IQA scheme together with a comprehensive dataset of molecular systems and their calculated IQA decomposed energies. It enables the rapid and accurate prediction of DFT single point energies and dissects these energies in a physically meaningful and chemically intuitive manner. Our method predicts all intra-atomic energies and inter-atomic interaction energies (covalent and non-covalent) within a defined distance cutoff, providing an energy function that decomposes the total energy into specific atomic contributions. This advance makes the IQA method viable for analyzing interaction energies in applications previously inaccessible due to computational expense, such as elucidating ligand-binding mechanisms and informing rational drug design.

19
Linobectide: a mathematical-chemistry modified black-hole algorithmic framework for ORF1p inhibitor design

GRIGORIADIS, I.

2026-05-08 biophysics 10.64898/2026.05.06.723314 medRxiv
Top 0.1%
37.6%
Show abstract

Computer-aided drug design for conditional biomolecular interfaces requires evaluation across more than one receptor structure, docking pose, or scalar score. LINE-1 ORF1p is treated here as a state-family interface target whose relevant behavior is distributed across receptor microstates, assembly-compatible contact neighborhoods, ligand conformers, and perturbation snapshots. This article presents Linobectide as a mathematical-chemistry CADD workflow centered on a modified black-hole algorithm (MBHA) for persistence-weighted prioritization of putative ORF1p inhibitor candidates. Each molecule is represented as a dossier containing standardized descriptors, docking annotations, interaction-class persistence vectors, finite-action stability traces, graph-localization summaries, SPECTRAL-SAR applicability-domain records, and rank-shift diagnostics. The revised analysis emphasizes numerical reporting endpoints: fixed run parameters, baseline comparators, ablation metrics, rank stability, regeneration fractions, protected-elite fractions, and reproducibility indices. Docking is used as an annotation layer rather than as a stand-alone proof of inhibition. The framework is therefore reported as a transparent computational prioritization protocol that generates testable hypotheses for future biochemical and cellular validation, not as experimental proof of ORF1p inhibition or therapeutic activity. Author summaryDrug-design workflows can become over-dependent on the best docking pose even when an interface target remains functional through alternative contact corridors. Linobectide addresses this issue by ranking candidates only after docking annotations are aggregated across receptor-state and perturbation conditions. The MBHA search promotes a candidate when interaction persistence, finite-action stability, graph localization, SPECTRAL-SAR coherence, applicability-domain support, and reproducibility checks are concordant. The revision removes unsupported claims of performance advantage and replaces them with benchmarkable endpoints that can be compared with docking-only, consensus-docking, and ablated MBHA baselines. The SI Appendix is retained as a figure atlas for state-family construction, graph-localization diagnostics, docking provenance, consensus geometry, and comparative triage.

20
PRISM: A High-Throughput Simulation Infrastructure for CADD Agents

Shi, Z.; Gao, X.; Xu, M.; Zhu, X.; Wang, P.; Yang, Y.; Yang, Z.; Zhou, R.

2026-04-06 biophysics 10.64898/2026.04.02.716083 medRxiv
Top 0.1%
34.8%
Show abstract

Despite rapid progress in AI agents for computer-aided drug design (CADD), protein-ligand simulation workflows remain fragmented across disparate tools, creating a major bottleneck for scalable candidate evaluation. Here, we present PRISM (Protein-Receptor Interaction Simulation Modeler), a Python platform built on GROMACS that unifies ligand parameterization across multiple force fields, automated system construction, enhanced sampling, multi-tier binding free energy estimation, and trajectory analysis within a single workflow. Through the Model Context Protocol (MCP), PRISM further serves as the computational infrastructure for CADD-Agent, an expert-workflow-driven AI agent designed to orchestrate hierarchical drug screening pipelines. As a pilot application, we applied PRISM to riboflavin synthase and demonstrated end-to-end automation from candidate library assembly to binding pocket characterization, identifying a potential allosteric inhibition site at the oligomerization interface. Together, these results establish PRISM as a high-throughput simulation infrastructure for agent-enabled CADD.